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1 Introduction 



Practical planning problems with deterministic forecasts of inherently uncertain parameters often 
yield unsatisfactory solutions. Stochastic programming formulations allow uncertain parameters to 
be modeled as random variables with known distribution, but the size of the resulting mathematical 
programs can be formidable. Dantzig [5] and Beale [1] introduced stochastic programming with 
recourse; some example applications from the literature include capacity expansion planning [7], 
forest harvest planning [12], hydroelectric scheduling [20,27], and portfolio management [22,26]. 

Stochastic programming algorithms are typically one of three types: (i) exact solution procedures, 
(ii) approximation and bounding schemes, and (iii) sampling-based methods. Exact methods include 
simplex-based algorithms that exploit special structure of bases [17], decomposition or L-shaped 
schemes [2,31], interior point methods [24], and the Progressive Hedging algorithm [29]. A classic 
approximation scheme involves calculating deterministic lower and upper bounds via the inequalities 
of Jensen and Edmundson-Madansky, respectively; see [3,4,11,21] for extensions and alternatives. 
Stochastic quasigradient (SQG) methods [9,13] are sampling-based. Another set of sampling-based 
algorithms are rooted in the L-shaped method [6,14,19,27]. In many models, as the number of random 
parameters and the number of scenarios grow, exact solution procedures and approximation and 
bounding schemes become more difficult to apply due to required computational effort. Sampling- 
based algorithms may provide an attractive alternative for such models. Stopping criteria for SQG 
methods are examined in [28]. In general, however, sampling-based approaches lack stopping rules 
that can control a priori the quality of the proposed solution. In this paper, we develop rules 
designed to rectify this shortcoming for a particular class of sampling-based algorithms. 

A host of questions arise when one replaces deterministic upper and lower objective function 
bounds generated by a decomposition algorithm with estimates formed from sample means. How 
should an “optimal” solution be characterized and how should the sampling procedure proceed so 
as to ensure an appropriately defined notion of convergence? The primary purpose of this paper 
is to provide a framework in which these issues may be addressed. The stopping rules we develop 
are comprised of two components: (i) a criterion for terminating the algorithm and (ii) a rule for 
selecting the sample sizes. The main results, detailed in §2, provide stopping rules that guarantee 
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asymptotic validity of the desired confidence interval statements as the interval width shrinks and the 
sample sizes grow. Application of these methods to an algorithm for a class of multistage stochastic 
linear programs [27] is described in §3. Empirical coverage results for a simple example are given in 
§4, and the paper is summarized in §5. 

2 Stopping Rules 

This section begins by outlining our framework of assumptions on the underlying sampling-based 
algorithm. Consider the following general optimization problem: 

z* — minimize z(x) ( . 

subject to x £ X. ' 

Suppose we have at hand an algorithm that at each iteration k selects a sample size n*, produces a 
feasible decision z*, and generates estimates for upper and lower bounds on the optimal objective 
function value denoted t/*(n*) and Lk(nk), respectively. It is assumed that at each iteration k the 
difference random variable , ) = Uk{^k) — £*(«* ), satisfies the central limit theorem (CLT): 

y/nk(Dk{n k ) - ^k) => N(0,al) as n k -> oo, where c k > 0 (2) 

and where => denotes convergence in distribution; N(n,a 2 ) denotes a normal random variable with 
mean // and variance g 2 . The sample size parameter will typically be suppressed when referring to 
the upper, lower, and difference random variables. The true upper and lower bounds at iteration 
k are denoted Uk and Ik and satisfy Uk => u* and Lk => Ik as rik — ► oo, where Ik < z* < ujt r 
and (necessarily) /z* = Uk — h- If Uk and Lk are independent and satisfy respective CLT’s then 
hypothesis (2) follows as a consequence. 

The algorithm is terminated on the first iteration, T, in which the difference random variable 
drops below zero; i.e ., 



T = inf {k : D k < 0} . (3) 

The feasible decision xp generated at the random stopping iteration satisfies ut > z(xt )• In 
addition, we assume that, at the stopping iteration, the algorithm permits re-evaluation of the 
difference random variable through independent resampling. The algorithm is said to stop correctly 
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if z(xt) < z* -f e where € is a positive confidence interval width. Stopping correctly is ensured if 
fly < c; we use this observation and the CLT hypothesis (2) to prescribe sample sizes, n*, under 
which a statement can be made regarding the probability that the algorithm stops correctly. At the 
heart of the procedures we develop is the idea that the sample sizes must increase as the algorithm 
proceeds. 

In order to illustrate the underlying ideas in a simple setting, we assume in §2.1 that the difference 
random variables at each iteration are normally distributed. Under this assumption, we show it 
suffices to increase the sample size at a rate proportional to log k to guarantee that the probability 
of stopping correctly satisfies a prescribed confidence level. In §2.2, the normality assumption is 
replaced with the CLT hypothesis (2) and an 0(log 2 k) sample-size formula is provided under which 
the results of §2.1 can be recovered in the form of an asymptotic validity result. Moreover, we indicate 
that with respect to required computational effort, the 0(\og 2 k) sample-size formula is preferable 
to the 0(\ogk) formula. In §2.3 we verify the asymptotic validity result under a weaker history- 
dependent CLT hypothesis in which the difference random variable may depend on the (potentially) 
random history of the algorithm in previous iterations. In §2.4 we address issues associated with 
finite stopping times and incorporation of sample variance estimators. 

2.1 Stopping Rules: Normal Differences 

In this subsection we replace the CLT hypothesis (2) with the following more restrictive assumption: 
At the k ih iteration of the algorithm, we choose a sample size n* and then observe the random 
variable: 



Die ~ N (n k ,crl/n k ) , where <r* > 0. (4) 

Example 1 This example indicates that a fixed sample size, n* = n, can lead to unattractive 
results. Suppose t/* = = /i > e for k = 1, . . . , K ; /i* = 0 for k > K 4- 1; and cr* = a > 0 for all k. 

The algorithm will not stop correctly if and only if it terminates prior to iteration K + 1. By 
choosing K sufficiently large we can make the probability we stop correctly, P{D\ > 0 , . . . , jDk* > 
0} = [P{D\ > 0}]*, arbitrarily close to 0. ■ 
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Theorem 2 provides a sample-size formula that overcomes the difficulty illustrated in Example 1; 
in particular, it ensures that the probability the algorithm stops correctly satisfies a minimum 
prescribed confidence level of 1 — a; e.g., 0.95. We require the following lemma regarding bounds on 
the tail of a normal distribution (see Feller Chapter VII §1 [10]). 

Lemma 1 If z > 0 then 

P{N(0,l)>z} < -Lie- 2 / 2 . 

V Z 7T Z 



Theorem 2 Assume (3), (4), and define 

”k > (ffij (/?+ 2pln&) , (5) 

where c > 0, /3 = max { 2 In [C(p)/(V^r a)] , l}, 0 < a < 1, and (( p ) = YlkLi ? P > 1- Then 
P {pr < e) > 1 — a. 



Proof 

Dt < 0 implies 

P{pt > < T{pt > Dt + 

oo 

= > 0, . . . , Dk-i > 0 ,Dk< 0, Dk < Hk - f} 

k = l 

OO 

< Dm <*-«>. 

jfc=i 

Now X2n=i[ a /C(p)] = <*; thus, it suffices to show 



PiDk^fik-^K-^k-P. 

To this end consider 

P{ Dl < w - f) m} 

= P{JV(O t l)>eVnTM} 

< P{W(0,1) > (/? + 2p In A:) 1 / 2 } 
Since /? > 1, the tail bound from Lemma 1 yields: 



P{D k < Hk - e} < = exp 

v2tt 



~2 (/? + 2pln fc) 
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Hence it suffices to show 



-7= <^P ~(0 + 2p]nk) 

V27T L * 

and this inequality follows from the definition of /?. ■ 

Remark 1 The coverage result of Theorem 2 states that [0 >e] is a (1 — a) • 100% confidence interval 
for /it- In terms of the optimization problem (1) this implies: 

P {z(xt) < z* + c} > 1 — a. 

Remark 2 Tables of values of the Riemann-Zeta function, C(p)> ma y be found in Dwight [8]; £(p) 
is also an intrinsic function in some mathematical software packages such as Mathematica [32]. We 
return to the topic of choosing values for the parameter p to minimize the total number of samples 
required in §2.2. 

Remark 3 The term “ k th iteration” should be interpreted liberally; abetter phrase is “ k th stopping 
cycle” with one possible definition as follows: A stopping cycle consists of a number of algorithm 
iterations in which a fixed sample size is used plus one resampling iteration in which the sample- 
size formula (5) is applied. The fixed sample-size phase of a stopping cycle is terminated when 
a heuristic pre-test is passed; e.g., a negative difference is observed. The sample-size formula 
and stopping criterion described above are then applied to a realization of the difference random 
variable generated from an independent sample. This idea, illustrated in the application of §3, helps 
to minimize computational burden. 

Remark 4 The purpose of our analysis is to control the quality of the proposed solution xt- Any 
number of heuristic stopping rules can generate a feasible solution, xt>- By resampling the asso- 
ciated difference random variable, one can form a (1 — 6) • 100% confidence interval of the form 
[z* } z* + Dj>, + W6 <tt> / y/n] for z (zt*) where n denotes the resampling sample size, Wf> satisfies 
P{N(0 y l) < w&) = 1 — 6, and D = max{D:r',0}. (We use D because it is known that /it • > 0.) 
The disadvantage of results based on heuristic stopping rules is that we have inadequate control 
of the interval width, Dj>, + w^vt' / y/n. The primary purpose of Theorem 2 (and the subsequent 
generalizations we present in this paper) is to provide a prion control on the confidence interval 
width. 
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Remark 5 We regard the specific values of z* and z(xt) of secondary interest relative to controlling 
the quality of the proposed solution, xt . However, if Ut and Lt are each normally distributed (or 
more generally satisfy respective CLT’s) then one can also develop confidence intervals for these 
quantities through the resampling procedure. 

Remark 6 The coverage result of Theorem 2 does not depend on any convergence structure of 
the optimization algorithm. The sample-size formula is designed so that if the algorithm does not 
converge then the stopping criterion will not be satisfied with probability, at least, 1 — a. This 
property contributes to the conservative nature of the coverage result since many algorithms have 
some underlying convergence structure. We provide conditions under which finite termination can 
be ensured (with probability one) in §2.4. 

2.2 Stopping Rules: CLT Differences 

In §2.1, under the assumption of normally distributed difference random variables, we derived a valid 
confidence interval for all positive interval widths, e. In this subsection, we replace the normality 
assumption with an asymptotic normality hypothesis (2) and provide conditions under which the 
confidence interval of §2.1 is valid in the limit as the interval width shrinks and the sample sizes 
grow. In particular, we show 

P {2(ar T(e) ) < 2 * +f} > 1 - a, (6) 

where the stopping time, T = T(c) is once again the first iteration in which the observed difference 
drops below zero. 

Example 1 was appropriate under the assumptions of §2.1 only because the confidence interval 
statement was made for all positive interval widths; for sufficiently small e, adequate coverage results 
are achieved. In Example 2 we again use an identical sample size for all iterations and construct 
a problem in which the probability of stopping correctly is zero in the limit as the interval width 
shrinks to zero. 

Example 2 Let Dk ~ N( k" 1 , n* 1 ) be independent and define m = [e -1 ]— 1. (The ceiling operator, 
[•*], yields the smallest integer greater than or equal to its argument.) If the algorithm terminates 
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prior to iteration m + 1 it has stopped incorrectly. Thus, 

P{»T( o< f ) < P{D X > 0 D n > 0} 

m 

k-\ 

Let e = Cn^ 1 ^ 2 > 0 so that the sample size has the same value for all iterations and observe: 

m 

< U p { N (°’ 1)<k ~^ Tn + ls ) c } 

k=l 

m 

< I] / J {A r (0,l)<(m+l)C/rm/2l} 

k=\m/2 1 

< [P {N(0, 1) < 3C}] rm/21 . 

Hence P{pt(c) <f}— ► 0 as e j 0. ■ 

Example 2 demonstrates that a fixed sample size, no matter how large, may lead to “confidence” 
intervals with unsatisfactory coverage properties. The key idea, once again, is that we must increase 
the sample size as the algorithm proceeds; Theorem 3 provides stopping rules under which the 
desired confidence intervals are asymptotically valid. 



Theorem 3 Assume (2), (3), and 



( Dk - Pk\ 

sup E exp 7 —= 

*>1 L \Vk/yJnkJ. 



is bounded for | 7 | < 7 ^. 



(7) 



Let /?' = max {2 In [<f>(p)/(V 27 t a)] , l}, where <t>(p) = \ ^ pln P > {^7 2 ) 1 an d 0 < a < 1. 

If 

k 2 



n k > (^7-) (/?' + 2pln 2 k) 



( 8 ) 



then lim P {pr(e) < f } > 1 — a. 



Proof 



Let Z k = cr k 1 y/rik (D k — /i*). We begin as in the proof of Theorem 2 and see that it suffices to show 



00 

lim]T < -(/?' + 2p]n 2 Jk) ,/2 | < a. 

ei ° *=1 



(9) 
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Next we show the order of the limit and sum on the left hand side of (9) may be exchanged with 
equality by employing the dominated convergence theorem. Applying Markov’s inequality (see, e.g., 
Loeve §9 [23]) to e^ Zk } 7 > 0 we have 

P [z k < -(/?' + 2p In 2 Jfc) 1/2 } < exp [- 7 (/?' + 2pln 2 ik) 1/2 ] E e~ yZk . (10) 

With 7 = 7 0 , the right hand side of (10) is bounded above by {sup fc>1 E e~ y ° Zk } k~ y °^ Zp . As 
loV^P > lj the order of the limit and sum may be exchanged with equality and it remains to show 

00 

lim P {z* < -(/?' + 2pln 2 *) 1/2 } < a. 

*=i ti0 

By applying the CLT (2) we may complete the proof in a fashion analogous to that of Theorem 2. ■ 
Under a weaker hypothesis on the difference random variables, we have recovered the coverage 
result of Theorem 2 in an asymptotic sense. The asymptotic validity result of Theorem 3 may be 
interpreted as follows: For sufficiently small choices of e, [z* , z* + e] is an approximate (1 — a) • 100% 
confidence interval for z(xt ). 

We now address two issues with respect to the hypotheses of Theorem 3. First, we provide 
conditions under which the bounded expectation assumption (7) holds and then we examine the 
function <j>(p). Verification of (7) is straightforward when each difference random variable may be 
expressed as a sample mean of independent and identically distributed random variables (i.i.d.r.v.’s); 
in particular, it suffices to verify that the underlying random variables have moment generating 
functions for (7) < j 0 . Indeed, this observation is at the core of an elementary proof of the central 
limit theorem for i.i.d.r.v.’s (see, e.g., Hogg and Craig [16]). Proposition 4 summarizes this result; 
we use the following notation: For each k , Du* • • • denote i.i.d.r.v.’s, Dk = 1 

EDki = Hk, and E(Dki — fik ) 2 = where 0 < <j\ < 00. 

Proposition 4 If sup E exp 7 ( — — ) < 00 then sup E exp 
Ar>l L \ /J jt>i 

We now turn our attention to <j>(p) = YltLi h~ p[nk . By comparing terms of the <j>(p) series and 

f° r sufficiently large k we conclude <j>(p) < 00 if p > 0. Proposition 5 provides bounds on 

<j>(p) that facilitate numerical function evaluations and further characterizes both <j>{p) and £(p) so 

that we may subsequently address the issue of choosing good values for the parameter p. 



is bounded . 
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Proposition 5 

(i) Let vh = yflp [In N — (2p)“ 1 ] and K p = \Jir/p exp [(4P)” 1 ] , p > 0. Then 



N 



K p P{N(0,1 )>v n+1 ) < < A' P P{^(0,1)>^} 



Ar=l 



Kp ( P {N( 0, 1) >v N )-P {N( 0, 1) > v N+1 }) < N~ phlN . 

(it) lnC(p) and In <p(p) are convex functions on (l,oo) and (0,oo), respectively. 
Proof 

Based on the inequalities 



/*£+ 1 

(k + i)-p ,n (*+0 < J U ~ p]nu du < k~ pink 



we find 



N 



*du. 



r°° _ f°° 

/ u- pinu du < ^(p)-Vlfe-P'n* < / U~ p]nu c 

JN + 1 = 1 J N 

A change of variables yields 

t oo />oo 

/ u-P' nu du = (2p)- 1/2 exp[(4p)- 1 ] / e-^^du - K p P {N{0, 1) > } , 

Vy 

where = \/Sp [In y — (2p) -1 ]. This establishes the first part of (i), and the error bound follows 
from 

roo rCQ 

/ u~ p,nu du - / u~ p]nu du < N~ p inN . 

JN Jn + 1 

For result (ii) we require the infinite series version of Holder’s inequality: 






k=l 



ll/r f 



D-r 



D‘*i‘ 



U= l 



1/5 



where r , s > 1 and l/r-|- 1/s = 1. For the Riemann-Zeta function we must show 



lnC(Api + (1 - A)p 2 ) < AlnC(pi) + (l - A)lnC(p 2 )- 



Using the definition of £(p) we see it suffices to show: 






k=i 



E k 

U=i 



-Pi 



A r 



U=i 



P 2 



1-A 
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This inequality follows from Holder’s inequality. The proof for In <j)(p) is virtually identical. ■ 

The 0(\og 2 k) sample-size formula of Theorem 3 is driven, in large part, by the moment gener- 
ating function hypothesis (7). By making the stronger assumption, 

is bounded for |^y | < 

the O(logJb) sample-size formula of Theorem 2 can be recovered; the difficulty, however, lies in 
developing analogs of Proposition 4. The moment generating function hypothesis is attractive be- 
cause it is easily verified and because with respect to required computational effort, the 0(log 2 k ) 
sample-size formula is preferable to the O(\ogk) formula for practical problems. For the purpose 
of verifying this statement we will assume a jt/e is constant as would be the case if e is selected 
proportional to <7* (see subsequent Remark 7). Minimizing the total number of samples or “work” 
over T iterations then corresponds to minimizing 

W{(p) — T • max |2 In |c(p)/(V^ a)J , 1 j + 2 p In k 

\k=i 



sup E exp 

*>i 



f Dk-tik 'Y 



for sample-size formula (5) or 

W^(p) = T • max |2 In \^>{p)/{s/ 27T a)j , 1 j + 2 p In 2 

for sample-size formula (8). From Proposition 5, part (ii) it is clear that W^(p) and W<p(p) are 
convex functions on (l,oo) and (0,oo), respectively. Table 1 displays the results of minimizing the 
respective work functions for various choices of T with a — 0.05. While T, of course, is unknown a 
priori , rough estimates for T (or ET) may be available for certain classes of problems. 



T 


P* 


formula (5) O(logfc) 

C (pi IW) 


P* 


formula (8) 0( log 2 k ) 

HpI *W) 


10 


1.5 


2.612 375 


106 


0.4 


5.048 588 


96 


25 


1.35 


3.459 237 


323 


0.25 


9.379 868 


291 


50 


1.29 


4.046 193 


731 


0.19 


14.865 183 


660 


10 2 


1.24 


4.761 075 


1 630 


0.155 


22.270 678 


1 473 


10 3 


1.15 


7.254 695 


21 715 


0.09 


94.647 997 


19 720 


10 4 


1.11 


9.676 075 


269 211 


0.065 


325.046 04 


246 153 


10 5 


1.09 


11.694 841 


3 199 021 


0.050 


1 175.999 4 


2 944 556 



Table 1: Optimal Choices of p 
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For each value of T, Table 1 contains the minimized (within 0.1%) work function values, the 
corresponding minimizers, p * , and C(P*) and <£(P*) fo r reference. Table 1 indicates that the 0(log 2 k) 
sample-size formula outperforms the 0(\ogk) formula for values of T dramatically larger than of 
practical interest (see Remark 3). The reason is as follows: Sample-size formula (5) requires p > 1, 
and this leads to a multiplier greater than 2 on the In k term of the work function while the 

corresponding term for W^(p) can have any positive coefficient. Theorem 2, of course, could be 
restated with the 0(log 2 k) formula; the bounding series C(P*) was initially selected because it lead 
to a better rate of growth. When applying Theorem 3, one may not be able to use the recommended 
values of p * due to the p > (27^)“ 1 condition. In many practical problems, however, the underlying 
random variables may be bounded, and under the i.i.d. hypotheses of Proposition 4 one can then 
choose any p > 0. 

Well-designed stopping cycles (Remark 3) help minimize required computational effort by reduc- 
ing the number of times that the sample size is increased. However, we wish to emphasize that even 
under poorly designed stopping cycle schemes the increase in computational effort as the algorithm 
proceeds is relatively modest. As an example, consider c — 2ajt/\/30 and a = 0.05. The sample sizes 
required by the 0(\og 2 k) sample-size formula with p = 0.155 at the 1 st , 10*\ 100 t/l , and lOOO^ 
stopping cycles are 78, 91, 128, and 189. 

2.3 Stopping Rules: History-Dependent CLT Differences 

We can generalize the results of Theorem 3 with respect to the assumptions on the interaction of 
the algorithm with the upper and lower bound estimates. In the development above, we assume 
{Pk^k }^- 1 is simply a sequence of constants, but in many applications these parameters may 
depend on the random history of the algorithm through iteration k — 1 which we denote Tik-i- For 
example, in an L-shaped algorithm for two stage stochastic programs in which cuts are obtained by 
sampling (see Dantzig and Glynn [6]), the sequence of master program decisions depends on which 
scenarios were (randomly) selected to compute cuts in previous iterations. In this example, it is 
clear /i*, er*, and Dk are random variables that are sample-path dependent. A realization of Tik 
may be thought of as the information necessary to reconstruct exactly the steps of the algorithm 
through iteration k. In this more general setting, we will assume that conditioned on the history 
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random variable, the mean and variance are constants and the difference random variables satisfy 



D k - pi b 



< y 






( 11 ) 



l <*k /y/^k 

Theorem 6 Assume (3), (7), (11), and define n k and the corresponding parameters as in Theo 
rem 3 . Then lim P {/^r(e) < € } > 1 — ot. 



Proof 

Let Z k = y /n k (D k — fi k ). We begin as in Theorem 2 and see it suffices to show 

oo 

lim ^ P \z k < -(/?' + 2pln 2 /t) 1 / 2 } < a. 

From the proof of Theorem 3 we know hypothesis (7) permits exchanging the order of limit and 
summation with equality. Now observe 

00 f 

lim P{Z k < -(/?' + 2pln 2 b) 1 / 2 | H k -x } 

it=i fl °“' 

00 r 

= J2 rliS p { Zk - ~ (/? ' + 2pln2 k)l/2 1 Hk ~' 1 

k = i ^ c| ° 
cx> 

= £ 
k= 1 

oo 

= *) ^ “(0' + Spin 2 *) 1/2 } • 

fc=l 

The remainder of the proof is analogous to that of Theorem 2. ■ 



J P {N( 0, 1) < -(/3' + 2pln 2 *) 1/2 } dP Hk _, 



Remark 7 The asymptotic validity results of Theorems 3 and 6 both still hold when we select e 
proportional to <r k \ i.e., a relative precision confidence interval. In particular, with e = ia k the only 
technical modification required is replacing lim C jo with lim c qo- 



Remark 8 A sufficient condition for verifying (7) in the history-dependent setting is 



E exp 



( Dk — jik \ 

\Ok/y/nk J 



n k - 1 



< M for | 7 | < 7 0 , 



( 12 ) 



where M and j 0 do not depend on the iteration k , the history, TL k) ot the confidence interval width, 
e (which determines n k ). This test may often be more natural to apply than attempting to verify 
(7) directly. 
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2.4 Stopping Rules: Additional Considerations 



In this subsection we address two additional issues. First, we describe how sample variance estimates 
may be incorporated in confidence interval construction, and second, we examine the issue of finite 
stopping times. 



2.4.1 Sample Variance Estimators 



If the confidence interval width, e, is a sufficiently small, pre-selected, fixed value then we interpret 
Theorems 3 and 6 as providing an approximate absolute precision confidence interval. However, the 
population variance terms, cr|, are typically unknown; hence n * is unknown and the procedure is 
not implementable. Alternatively, if e is proportional to <7* , we obtain a relative precision confidence 
interval (see Remark 7) with unknown width ear- There are standard approaches to this difficulty 
based on well-known results from parameter estimation in statistics. For simplicity, we describe one 
possible approach in the setting of Theorem 3 for the relative precision case. We replace the CLT 
hypothesis (2) with the following assumption: 

=> AT(0, 1) as n t -> oo, (13) 

Sk/s/n k 

where is a sample variance estimator. We may then form a “sample- variance” analog of Theorem 3 
by replacing with s\ in (7) and (8). However, this result is of limited value because the sample 
variance equivalent of the moment generating function hypothesis (7) may be difficult to verify; if 
Dk is a sample mean of i.i.d. normal random variables then s^ 1 y/rTk (Dk — /^Ar) is a Student’s T 
random variable and does not have a moment generating function. (The situation for an absolute 
precision confidence interval is at best unimproved.) 



A simple solution to this difficulty is as follows: At iteration T we can resample the difference 
random variable at the proposed solution xt (see Remark 4). We denote this random variable Dt 
and assume it satisfies the CLT (2) with mean /ix and variance Coupled with a weakly consistent 
sample variance estimator, this ensures the sample variance version of the CLT hypothesis (13) 
holds for Dt • From this we infer 







WfiST 



(14) 
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is an approximate 100(1 — b)% confidence interval for z(xt) where n denotes the (resampling) sample 
size and W& satisfies P{N(0 , 1) < ws} = 1 — h. As Theorems 3 and 6 ensure pr is not too large, 
we have a 'priori control on the confidence interval width. Restated: For sufficiently small e and for 
sufficiently large n — n(rj) we have 

WfiST 

y/n 

2.4.2 Finite Stopping Times 

It would be undesirable if the stopping rules we have developed precluded finite termination for 
“well-behaved” algorithms. To this end, we introduce the notion of a stopping tolerance e* satisfying 
0 < e f < e and make the following assumptions. The stopping time is redefined as 



P{Dt + 



> c&t 4* 






a for any r) > 0. 



(15) 



T = inf {k : D k < P) . 

*>i 1 - J 



(16) 



Under this termination criterion P{T > m) = P{D X > e', D 2 > P , . . D m > e'}, and we assume 



P{Di > e',D 2 > c’,...,D m > P 




P{D k >P\n k .i)dPH m _ t 



(17) 



which is a natural generalization of inter-iteration independence of the difference random variables 
to the history-dependent setting. With regard to the convergence structure of the algorithm we 
assume there exists a subsequence of {nk} < j?= 1 that converges to zero with probability one; i.e., 



P {u : ?foo(w) implies 3 {^(w)}^! such that /^(w) — ► 0} = 1. (18) 



In this framework the following modification of Theorem 6 incorporates a finite stopping result. 



Theorem 7 Assume (7), (11), (16), P > 0 and define 

nt > (/?' + 2pln 2 k) 

and the corresponding parameters as in Theorem 3 . 

Then 



lim P {^t(c) < c} > 1 — a. (asymptotic validity) 
If, in addition, (12), (17), (18) hold and 0 < e f < e then 

P{T(e) < oo} = 1. (finite stopping time) 



(19) 
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Proof 



By hypothesis (17) 



P{T < 00 } > 1 - [f[P{D k x , \Hk-i}dP Hm _ i . 
J k = 1 



We know from the bounded convergence theorem it suffices to show 



lim TTP {£>* > (' | Hh- 1 } = 0 w.p.l. 

m— *>00 A A 
k—l 

For any e' > 0, condition (12) and — ► 00 as k — * 00 ensures there exists K such that k > K 

implies E {|£>*. — Hk\ | W*_i} < c*/4. Applying (a conditional version of) Markov’s inequality we 
find 

P{\D k -Hk\ > f'/2 \'Hk-i}<E{\D k -it k \\H k - l } /(c'/2). 



Define K m = {k : k <m, k > K, pi k < e'/2} , and observe 

m 

Yl P {D k > (' \n k .i} < n P {|At — n k \ > c'/2 | 'Hk-i} < (1/2) |a: 

k - 1 ktKm 

The finite stopping result follows as (18) implies \K m \ — ♦ 00 as m — > 00 w.p.l. 



The proof of the asymptotic validity result is virtually identical to that of Theorem 6. ■ 



3 Application 



A T-stage stochastic linear program with recourse (SLP-T) may be formulated as follows: 

r 

minimize 

t — 1 W| 

subject to -B^x^ + Afx'f' = 6"', x u % ' >0, ui, € Q, 

SLP-T 

for < = 1,...,T 

where 1 =0. 

A sample point (scenario) in the stage t sample space, Q*, is denoted u t . A stage t > 2 scenario, 
u t , has a unique stage t — 1 ancestor denoted a(u>*), and a stage t < T scenario has a set of stage 
t -f 1 descendant scenarios denoted A(u;*). A stage t realization, = vec(c ^ f , Vf * , Ajf'*), is 

to be read column-wise as a vector in where N t = n t + m t -f m t • n t -\ -f m t • n t \ A^ x is an 
m t x n t matrix and the remaining matrices and vectors are dimensioned to conform. We assume 
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has finite support and a probability mass function given by P {£* = = pf*. For notational 

convenience, we assume a first stage sample space, fii that is a singleton set where represents 
the known state at the time decisions are made in the first stage; clearly, p^ 1 has value one. At 
the time decisions are made in stage /, the observation fj*'* and the previous stage’s decision 
are known to the decision maker; the goal is to find a first stage decision, x\, that minimizes the 
expected cost of operating the system over T stages. 

Pereira and Pinto [27] have proposed a sampling and decomposition-based algorithm for SLP-T 
models with interstage independence of the stochastic parameters and a “manageable” number of 
descendants, |A(a>t)|, for each node in the scenario tree. Note that if T is moderately large, appli- 
cation of exact methods or bounding and approximation schemes (see §1) may be computationally 
impractical. The basic idea behind the algorithm is to compute upper bound estimates by sampling 
paths through the scenario tree on a forward pass, and to compute deterministic outer-linearization 
cuts on a backward pass. Figures 1 and 2 are designed to illustrate this concept. 




A sample path, u = (u>i,u> 2 , specifies exactly one node per stage in the scenario tree 

and has the property that the nodes identify a path from the stage 1 node to a stage T leaf node. 
Nonanticipativity constraints are satisfied on a forward pass along a sample path: The first stage 
problem is solved with only previously generated cut information regarding the future. The first 
stage decision is then passed to the right-hand-side of a randomly selected second stage subproblem, 
sub(u; 2 ), and the process is then repeated from stage 2 to 3. A forward pass along a sample path 
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simulates operation of the system given the cuts collected in previous iterations. A new cut is then 
appended to the set of cuts at each stage on a backward pass along each sample path. Under the 
assumption that the stochastic parameters are interstage independent, the cuts computed for any 
stage t subproblem are valid for all other subproblems on that stage. The algorithm for SLP-T is 
summarized in Figure 3. 



step 0 let k = 0; 

append lower bounding cuts 9 t > —M ti t= 1 , . . . ,T — 1; 

step 1 solve the stage 1 master program and obtain (xf,^); 
let z_ k = C\x k H- 9 k \ 

step 2 select a set of random sample paths S k according to p ^ T ; 
do £ S k 

do t = 2 to T 

form RHS of sub(u> t ): B^ x [x^^] k -f 
solve and obtain [x^']*; 
enddo 
enddo 

let z k = cix k -F c T[ x V\ k \ 

step 3 ifT^ — z} < 0 then stop: xj is the proposed solution; 

step 4 do t = T — 1 downto 1 
do u> G S k 

do v t +i £ A(cjf) 

form RHS of sub(w,+i): 5" ,+, [x“ (t " +,) ]* + 6"^*; 
solve and obtain dual variables; 
enddo 

use dual variables to compute cut; 
append the set of stage t cuts with 9 t — GtX t > g t \ 
enddo 
enddo 

step 5 let k = k + 1; goto step 1; 

Figure 3: Decomposition Algorithm for SLP-T 
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The algorithm of Figure 3 generates valid lower bounds (i.e., they contain no error due to 
sampling) and in the history-dependent sense, sample mean upper bounds on the optimal objective 
function value; applying Theorem 6 in this setting is straightforward. The first stage decision at the 
k th iteration, a:*, is a random variable; it depends on the set of first stage cuts, and these cuts, in 
turn, depend on which sample paths were selected in previous iterations. However, given the history 
of the algorithm through the first k — 1 iterations, lik~ l, the decision x\ is known, and the upper 
bound estimate ~z k is the sum of a deterministic constant, C\x k , and a sample mean of i.i.d.r.v.’s 
with realizations of the form [zif]* — YlJ=2 c V[ x V] k • Thus, the upper bound estimate satisfies a 
history-dependent CLT with mean u k = c\x\ + Ez k and variance cr% k — & ( z \ — £T*) 2 • The lower 
bound, z_ k = / jt, is deterministic when conditioned on Thus the difference random variable 

Dk =J k — z_ k satisfies the history-dependent CLT hypothesis (11) of Theorem 6 where p k = u k - h 
and v\ — v\ k . The random vectors £* = vec(c*, b ty B t ~i, A t ), t = 2,...,T are bounded and if we 
assume, for example, that the subproblems at each stage have bounded primal feasible regions then 
the random variable z k is bounded. Thus the conditional moment generating function hypothesis 
(12) is satisfied (see Proposition 4), and hypothesis (7) then follows. 

We now describe the recommended algorithm for SLP-T based on the stopping rule theory 
of §2. The modified algorithm utilizes the idea of stopping cycles (see Remark 3) which help to 
minimize computational burden. The existing steps of the algorithm in Figure 3 are modified 
as follows. To step 0 we append “define e > 0; let v = 0, n u > 0, p > 0, 0 < a < 1, and 
let /?' = max {2 In a)] , l} ” See Section 2.2 for recommended values of p and the 

corresponding values of In step 2 we select the random sample S k to be of size n u . In step 

3, if the heuristic pre-test is satisfied then a stopping cycle is complete and we go to step 6. The 
additional steps are detailed in Figure 4. (Note that we have redefined t to be the c of Remark 7 so 
that we are employing a relative precision confidence interval.) 

The sample-size formula (8) of Theorem 6 is satisfied by the modified algorithm, and hence the 
asymptotic validity result holds. Note P{pr < > 1 — implies P{It < ^(^t) < b + £&t} > 

1 — a; moreover, in this setting, It is a known lower bound when the algorithm terminates (see 
Remark 5). While ctt is unknown, as a practical matter one may be satisfied that the solution 



18 



step 6 let v = v + 1; and n y = \e 2 (/?' + 2pln 2 i/)] 

step 7 independently select a set of new sample paths S k according to pj T of size n u \ 
do (jj G S k 

do t = 2 to T 

form RHS of sub(u;*): + 6^*; 

solve and obtain 
enddo 
end do 

let ** = c ia; * + Zf =2 #'«']*; 

step 8 if J k — 2 * < 0 then stop: x\ is the proposed solution; otherwise goto step 4 

Figure 4: Additional “Stopping Rule” Steps to the Algorithm for SLP-T 

Xt is of sufficiently high quality based on sample variance estimates observed during the course of 
the algorithm. Alternatively, the resampling procedure described in Section 2.4 may be applied to 
obtain a confidence interval of the form (14); we can also replace 2 * by the known lower bound l? 
in (14). 

4 Empirical Coverage Results 

In this section we present preliminary empirical coverage results that illustrate dangers associated 
with naive stopping rules and show empirical performance of the recommended stopping rules devel- 
oped in §2. The simplistic “test problems” are motivated by Example 1, and we use a pseudo-random 
number generator [30] to directly form the difference random variables. 

In the first group of these problems, the true gap is //* = 2/3 for k = 1, . . . , K and p k = 0 for 
> K + 1, and the pre-specified confidence interval width is e = 1/3. We terminate on the first 
iteration in which the difference random variable drops below zero; if the algorithm stops prior to 
iteration K + 1 it will have stopped incorrectly. The difference random variables are sample means of 
i.i.d.r.v.’s of the form U(— \/3, \/3) + 2/3, where U(a,b) is a uniform random variable on the interval 
(a, b). The choice of (a, b) = (— \/3, \/3) yields <7* = 1. Table 2 depicts the empirical coverage 
results for two separate stopping rules on four test problems with different values of K . The first 
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stopping rule ignores the sequential nature of the problem and uses the fixed sample size that would 
generate an asymptotically valid 95% one-sided confidence interval of width 1/3 for the true gap, if 
the algorithm consisted of only one iteration. We choose a = 0.05 which yields z Q = 1.645 and thus 
n = |^(z cr (Tjt/e) 2 j = 25. In the second stopping rule, we increase the sample size at each iteration 
according to sample-size formula (8). The parameter p is selected from Table 1 for each value of K . 
Table 2 is based on 1000 replications and shows that ignoring the sequential nature of the problem 
does not lead to undercoverage results in this setting until K grows large and formula (8) leads to 
100% coverage in each case. 



K 


n = 25 n k = O{\og* k) 




coverage 


10 


0.997 


1.000 


100 


0.963 


1.000 


1000 


0.692 


1.000 


10000 


0.029 


1.000 



Table 2: Coverage Results: e = 1/3 and // = 2/3 



Next we consider the case in which pk = e = 1/3 for k = 1, . . . , K . Thus the only modification 
to the previous group of test problems is that the difference random variables are sample means of 
i.i.d.r.v.’s of the form U(—V 3, v^3) + 1/3. This is a more demanding test of the stopping rule theory 
because the true gap and the confidence width are identical. The coverage results are summarized 
in Table 3. The coverage results obtained via sample-size formula (8) significantly exceed the 95% 
confidence level in each case. As one might expect, the naive stopping rule has very poor coverage 
results in this setting. 



K 


n = 25 


n k = 0( log' k) 




coverage 


10 


0.613 


0.994 


100 


0.007 


0.995 


1000 


0.000 


0.993 


10000 


0.000 


0.989 



Table 3: Coverage Results: e = 1/3 and p = 1/3 
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5 Summary 



In this paper we developed a stopping rule theory for a class of optimization algorithms that estimate 
upper and lower bounds on the optimal objective function value via sampling. While our immediate 
motivation lies in developing stopping rules for a class of Monte Carlo sampling-based stochastic 
programming algorithms, the theory may also be applicable to other optimization algorithms that use 
simulation techniques. In the main result, we assume that the difference random variables satisfy 
history-dependent central limit theorems and provide appropriate conditions and a sample-size 
formula under which the desired confidence interval for the objective function value of the proposed 
solution is asymptotically valid. We regard the recommended procedure as conservative because: 
(i) underlying convergence properties of the optimization algorithm are ignored in developing the 
methodology; and (ii) the normal tail bound used to derive the sample-size formula is not sharp, 
particularly in the early iterations. Moreover, through well-designed stopping cycles and because of 
the slow growth of the sample size formula, the recommended stopping rules are practical from an 
implementation standpoint. 

The applicability of the stopping rule theory was illustrated on an algorithm for a class of mul- 
tistage stochastic linear programs (Pereira and Pinto [27]) that generate sample mean upper bound 
estimates and deterministically valid lower bounds. In other sampling-based stochastic program- 
ming algorithms for two stage programs (Dantzig and Glynn [6], Infanger [19], Higle and Sen [14]), 
sample mean upper bounds are readily available, but lower bound estimates have proved more dif- 
ficult to analyze. Development and analysis of lower bound estimators for these sampling-based 
algorithms remains an active area of research [15,18,25]; the procedures we have developed here 
should be useful for asymptotically normal estimators. 
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